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Assessing luminosity correlations via cluster analysis: Evidence for 
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ABSTRACT 

The radio:X-ray correlation for hard and quiescent state black hole X-ray binaries is critically 
investigated in this paper. New observations of known sources, along with newly discovered 
ones (since 2003), have resulted in an increasingly large number of outliers lying well out- 
side the scatter about the quoted best-fit relation. Most of these outliers tend to cluster below 
the best fit line, possibly indicative of two distinct tracks which might reflect different ac- 
cretion regimes within the hard state. Here, we employ and compare state of the art data 
clustering techniques in order to identify and characterize different data groupings within the 
radio:X-ray luminosity plane for 18 hard and quiescent state black hole X-ray binaries with 
nearly simultaneous multi-wavelength coverage. Linear regression is then carried out on the 
clustered data to infer the parameters of a relationship of the form £ T — a + j3 l x through 
a Bayesian approach (where £ denotes logarithmic luminosities). We conclude that the two 
cluster model, with independent linear fits, is a significant improvement over fitting all points 
as a single cluster. While the upper track slope (0.63 ± 0.03) is consistent, within the errors, 
with the fitted slope for the 2003 relation (0.7 ± 0.1), the lower track slope (0.98 ± 0.08) is 
not consistent with the upper track, nor it is with the widely adopted value of ~ 1.4 for the 
neutron stars. The two luminosity tracks do not reflect systematic differences in black hole 
spins as estimated either from reflection-, or continuum-fitting method. Additionally, there is 
evidence for at least two sources (H1743— 322 and GRO J1655— 500) jumping from the lower 
to the upper track as they fade towards quiescence, further indicating that black hole spin does 
not play any major role in defining the radio loudness of compact jets from hard black hole 
X-ray binaries. The results of the clustering and regression analysis are fairly insensitive to 
the selection of sub-samples, accuracy in the distances, and to the treatment of upper lim- 
its. Besides introducing a further level of complexity in understanding the interplay between 
synchrotron and Comptonised emission from black hole X-ray binaries, the existence of two 
tracks in the radio:X-ray domain underscores that a high level of caution must be exercised 
when employing black hole luminosity-luminosity relations for the purpose of estimating a 
third parameter, such as distance or mass. 

Key words: Black hole physics - Accretion, accretion discs - ISM: jets and outflows - X- 
rays: binaries - Radio continuum: general - Methods: statistical 



1 INTRODUCTION 

The existence of a tight, non-linear correlation between the ra- 
dio and X-ray flux of hard state black hole X-ray binaries over a 
wide dynamic range was first reported by Corbel et al. (2003) for 
GX339— 4 (see also Hannikainen et al. 1998). Re-analyzing a se- 
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ries of quasi-simultaneous observations for a sample of 10 hard 
state black holes, collected over different epochs and with different 
instruments, Gallo et al. (2003, hereafter GFP03) concluded that 
the same relation, with a slope of (0.7 ± 0.1) (i) was a general 
property of hard state black holes; (ii) had similar normalizations 
for all sources, or in other words, it could be generalized for radio 
and X-ray luminosities all the way from quiescence up to a few per 
cent of the Eddington X-ray luminosity, above which a transition 



2 E. Gallo, B. Miller & R. Fender 



to the thermal dominant state (see, e.g., McClintock & Remillard 
2006 for a review of X-ray states) is typically observed, along with 
quenching of the core radio emission (Fender et al. 1999; Russell 
et al. 201 1; see Fender 2006 for a review of radio properties of X- 
ray binaries). With the inclusion of the nearby quiescent black hole 
binaries A0620— 00 (the faintest radio:X-ray detection on the re- 
lation; Gallo et al. 2006) and V404 Cygni at its lowest luminosity 
level (Corbel et al. 2008), the slope has been more recently revised 
to a somewhat shallower value (0.58 ± 0.16). 

As radio and X-ray luminosities are thought to trace differ- 
ent emission processes - namely self-absorbed synchrotron from 
downstream in the jet vs. inverse Compton/optically thin syn- 
chrotron from the compact corona/jet base, respectively - the slope 
of the correlation has been interpreted as evidence that a dominant 
fraction of the X-ray spectrum of hard state black hole X-ray bina- 
ries is due to optically thin synchrotron emission from the jet base 
(Markoff et al. 2003). Broad-band (radio-to-X-ray) spectral fitting 
provides independent evidence that this might be the case for a vari- 
ety of weakly accreting black holes, both stellar and super-massive 
(Markoff et al. 2001, 2004, 2005, 2008; Maitra et al. 2009, 2011; 
Migliari et al. 2007; Russell et al. 2010; Malzac et al. 2004 - see 
e.g. Corbel et al. 2004 for a counter-example, however), although 
other possible interpretations of the scaling relation remain open 
(e.g. Maccarone 2005). Regardless of its physical explanation, the 
non-linearity of such 'universal' scaling implies the existence of 
an accretion regime where most of the accretion power is released 
(likely in the form of mechanical power) within the jet, as opposed 
to being dissipated locally within the accretion-corona system (as 
discussed in GFP03, this is simply as a result of the empirical 0.7 
slope combined with the theoretical prediction that the radio power 
and the total jet power correlate with a 1.4 power law slope in par- 
tially self-absorbed jet models; cf. Blandford & Konigl 1979). 

Migliari & Fender (2006) subsequently reported on a similar 
investigation for the neutron star systems, showing that a steeper 
(albeit less tight) correlation exists between the radio and X-ray 
luminosity in neutron stars, too, of the form L r oc L-^ 4 . Inciden- 
tally, the difference in the inferred slopes for the black holes and 
the neutron stars nicely combine such that, unlike the black holes, 
neutron stars never enter the 'jet-dominated' regime. If so, that 
could be sufficient to account for the observed gap in luminosity 
between quiescent black holes and neutron stars, without requiring 
the presence/absence of a solid surface (Fender et al. 2003; see also 
Kording et al. 2006a). 

A qualitatively similar relation, with a slope of (0.6±0.1), has been 
reported between the optical-infrared (IR) and the X-ray luminos- 
ity of both black hole X-ray binaries and low mass neutron stars 
X-ray binaries while in the hard state (Homan et al. 2005, Russell 
et al. 2006, Coriat et al. 2009), indicating a non-negligible contribu- 
tion from the jet to the near-infrared flux in these systems, although 
the relative contributions from jet vs. reprocessed disc emission 
vary greatly between the two types of sources. Interestingly, there 
appears to be a hysteresis effect in the near-IR:X-ray correlation 
of hard state black hole X-ray binaries, namely, at a given X-ray 
luminosity, the near-IR emission appears to be weaker during the 
rise of an outburst than during the decline of an outburst (Russell et 
al. 2007). 

Expanding on those works, two separate groups examined 
the possibility that similar scalings hold for super-massive black 
holes in Active Galactic Nuclei (AGN) of the radio-weak variety, 



i.e. the AGN sub-class whose properties most closely resemble 
those of hard state black hole X-ray binaries (e.g. Maccarone et 
al. 2003, Kording et al. 2006b). Merloni et al. (2003) and Falcke et 
al. (2004), independently came to the same conclusion, that is, that 
weakly accreting black holes - from stellar mass scales to super- 
massive - preferentially lie on a plane in the three-dimensional pa- 
rameter space defined by their mass, nuclear X-ray luminosity and 
radio luminosity (see Kording et al. 2006c, Merloni et al. 2006, 
Li et al. 2008, Gultekin et al. 2009, Yuan et al. 2009, Miller et 
al. 2010 for more recent studies). While the physical explanation 
for this finding is non trivial (see e.g. Heinz 2004, Heinz & Merloni 
2004, Plotkin et al. 2012), the best-fit relation for this 'fundamen- 
tal plane of black hole activity' may be employed for estimating or 
predicting one of the three observables (most notably mass) based 
on measurements of the remaining two (see e.g. Reines et al. 2011 
and Miller & Gultekin 2011 for two recent as well as remarkable 
examples). 

Going back to the stellar mass black holes in X-ray binaries, mea- 
surements of radio and X-ray flux for newly discovered sources are 
similarly employed to estimate their most likely distance based on 
their location with respect to the best-fit relation in the radio/X-ray 
domain (e.g. Cadolle-Bell et al. 2007, Paizis et al. 201 1, Rodriguez 
et al. 201 1). Despite its fairly broad impact, though, the initial pic- 
ture of a tight universal radio:X-ray luminosity relation has slowly 
drifted toward a more complex scenario. An increasing number of 
'outliers' (Gallo 2007) have been reported, defined as falling well 
outside the scatter about the original GFP03 best-fit relation (see 
next section for references). Most of these points tend to be clus- 
tered below the best-fit relation (i.e. at lower radio luminosities for 
a given X-ray luminosity), leading to the suggestion that there may 
be two separate tracks in the radio/X-ray luminosity plane of black 
hole X-ray binaries, each representing a different mode of accre- 
tion within the hard state (e.g. Coriat et al. 201 1; see also Rushton 
et al. 2010, restricted to GRS 1915+105). 

Motivated by Occam's razor, this paper sets out to give a sta- 
tistically rigorous answer to the question whether, in the face of an 
increasing number of outliers and uncomfortably large scatter, it 
still makes sense to refer to a universal radio/X-ray correlation for 
hard state black holes X-ray binaries (hereafter, BHBs). The reader 
is referred to Xue & Cui (2007) for a different approach. 



2 SAMPLE DESCRIPTION 

We consider a sample comprised of multiple, quasi-simultaneous 
(i.e. within a day) radio and X-ray observations of 18 black hole 
X-ray binary systems while in the hard X-ray state, for a to- 
tal of 166 data points (157 detections and 9 upper limits in ra- 
dio luminosity). In addition to the original sample of hard state 
BHBs discussed in GFP03 (9 sources excluding Cygnus X-lTj, 
the following 9 are included: XTE J 1650- 500 (data from Cor- 
bel et al. 2004); XTE J1908+094 (data from Jonker et al. 2004); 
A0620-00 (data from Gallo et al. 2005); XTE J1720-318 (data 
from Brocksopp et al. 2005, 2010); GRO J1655-40 (data from 



1 Data points relative to the high mass BHB Cygnus X- 1 are excluded from 
this analysis (as they were from GFP03) on the basis of its repeated 'failed' 
state transitions. See e.g. Rushton et al. (2012), discussing the peculiar radio 
behavior of this source. 
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Figure 1. The full dataset includes 165 hard-state observations of 18 different XRBs, with radio and X-ray luminosities as indicated. Cygnus X-1, GRS 
1915+105, along with the radio upper limits (combined with strictly non-simultaneous X-ray detections) by Miller- Jones et al. (201 1) are plotted for illustrative 
purposes but are not included in our analysis (see text). £ denotes log luminosities in units of erg s _1 . For the 18 sources under analysis, the inset illustrates 
luminosities for the sources with secure distance measurements, in red, vs. uncertain distance measurements, in blue. 



Migliari et al. 2007 & Calvelo et al. 2010); IGR J17177-3656 
(data from Paizis et al. 2011); Swift J1753.5-0127 (data from 
Cadolle-Bel et al. 2007 & Soleri et al. 2010); H1743-322 (data 
from McClintcok et al. 2009, Jonker et al. 2010, Coriat et al. 201 1); 
IGR J17091-3624 (data from Rodriguez et al. 201 1). In addition, 
new data points are included for 4U 1543—47 (data from Kalemci 
et al. 2005) and V404 Cygni (data from Corbel et al. 2008). A 
plot of radio luminosities (calculated by integrating up to the ob- 
served frequency and assuming a flat spectrum) versus 1-10 keV 
X-ray luminositiesnfor the 1 8 hard state BHBs under consideration 
is shown in FigureTT] whose label also includes the adopted dis- 
tances. Data points from Cygnus X-1 and GRS 1915+105 are also 
included for comparison, along with the radio upper limits from a 
recent deep survey of quiescent BHBs carried out by Miller- Jones 
et al. (2011, M-Jll hereafter), but for which simultaneous X-ray 
counterparts are based on archival X-ray detections, often from ob- 
servations that were taken several years prior to the radio survey. 

Several authors (e.g. Rushton et al. 2010, Jonker et al. 2010, 
Coriat et al. 2011, Ratti et al., submitted) have suggested that 
there appears to be dual upper and lower luminosity tracks in this 
domain, particularly when compared against figure 2 of GFP03, 



2 For consistency with GFP03, unabsorbed luminosities quoted in dif- 
ferent energy bands have been converted to 1-10 keV luminosities using 
WebPimms and assuming a power law spectrum with index T = 2. 



where a single correlation was identified. For descriptive purposes, 
the divide can be loosely set by the H T = £ x — 7 line, where £ 
denotes log luminosities in units of erg s _1 . We investigate the 
effective validity of such a split classification in the following sec- 
tions. 



When available, luminosities have been updated based on re- 
cent and or more accurate distance measurements (based on the 
references in Table 2 of Fender et al. 2010 and Jonker & Nelemans 
2004, except for the distance to XTE J 1550-564 which has been 
recently revised to 4.4 kpc; cf. Orosz et al. 2011). Seven of out 
the 18 sources considered here (i.e. H1743-322, XTE J1650-500, 
XTE J1720-318, XTE J1908+094, Swift J1753.5-0127, IGR 
J17177— 3656 and IGR J17091-3624) have less than reliable dis- 
tance values, in that their distance determinations are based on 
loose absorption arguments, or, in some instances, on their very lo- 
cation on the radio/X-ray luminosity plane investigated here. We 
refer to this subsample as 'uncertain distance' sample (vs. 'se- 
cure distance sample'). The inset of Figure 1 illustrates the loca- 
tions of the two samples: note that culling objects with uncertain 
distance measurements removes the majority of the lower track 
data points (71 out of 77), although those six points that remain 
come from 3 distinct objects with secure distances (GRO 1655—40, 
XTE 1550-564 and GRS 1758-258). 
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Figure 2. Model-based clustering results for n=2,3,4,5 clusters (mclust 
code). Distributions are typically ellipsoidal, with varying enclosed area 
and orientations. Labels give the Bayesian Information Criterion (BIC) ans 
sclaled-BIC values; models with the lowest (i.e. =1) scaled-BIC are pre- 
ferred. 



3 CLUSTERING ANALYSIS 

For the sample described above, we address three general ques- 
tions: (i) whether there are necessarily two (or more) clusters pop- 
ulated within the l T — £ x plane; (ii) if so, what function optimally 
characterizes the clustering, and with which cluster is each individ- 
ual observation most likely associated; (iii) given these groupings, 
is the single linear regression model derived from all observations 
consistent with the best-fit lines within each cluster, and to what 
degree, if any, is the distinct-trends model statistically preferred. 
We wish to stress that the clustering analysis can only handle de- 
tections; we include the treatment of the upper limits in the linear 
regression analysis. 

Potential clustering of observations within the radio/X-ray 
luminosity plane is investigated using standardized, x:y coordi- 
nates (as opposed to raw L r :L x ). 'Standardization' is a way of 
scaling variables so that different variables, measured in different 
units and/or over different dynamic range, can be more easily 
compared. While both radio and X-ray luminosity for our sample 
are measured in erg s _1 , they are integrated over different energy 
ranges and likely account for different fractions of the bolometric 
luminosity emitted via their underlying emission process; in 
addition, the data set tends to stretch along the L x axis. The first 
transformation (Lx — > X 1 , L r — » y) includes taking the logarithm, 
subtracting the mean and dividing by the standard deviation, such 
that the resulting range of values for the primed coordinates is 
comparable along the x and y' axis, while they both end up 
centered at x' — y' — 0. Principal Component Analysis (PC A; 
e.g. Jolliffe et al. 2002) applied to x' and y' identifies the major 
component vector (essentially parallel to the upper track as defined 
above) to account for 87% of the data variability. The data are 
rotated accordingly around their mean, such that they end up align- 
ing with the principal component. The rotated variables are again 
scaled to unit variance, and clustering analysis is finally carried out 
on the resulting vectors, which we will refer to as x and y hereafter 
(see Figured. This ensures comparable sensitivity (in terms of the 
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Figure 3. Scaled BIC values (i.e. divided by the BIC value corresponding to 
the best-fit cluster model) forn = 1, 2, 3, ..6 model-based clusters; dashed 
and dot-dashed lines are 1±<t over the interval marked by the dotted seg- 
ment. A measure of the preferred number of clusters is given by the leading 
edge of the plateau in scaled BIC. With (without) XTE Jl 118+480, three 
(two) clusters are preferred. 



ability to identify clusters) both along and across luminosity trends. 



3.1 Model-based clustering 

We first use mclust, a multivariate normal mixture modeling and 
model-based clustering code (Fraley & Raftery 2002, 2007), to 
evaluate the presence of n clusters, where n is allowed to vary 
from one to six. The considered models are spherical, diagonal, or 
ellipsoidal distributions (permitted to vary in area, shape, and ori- 
entation, where applicable), and models are ordered by likelihood 
as estimated using the Bayesian Information Criterion (BIC). BIC 
resolves the problem of 'over-fitting' (increasing the likelihood of a 
model by adding parameters) by introducing a penalty term for the 
number of parameters in the model (it can be thought as an analog 
to reduced \ 2 statistics in a clustering analysis framework). Given 
a series of models, the one with the higher BIC value is the one to 
be preferred (note that, by construction, BIC<0). 

Figure 2 illustrates the results of this exercise by showing dif- 
ferent clustering for n = 2, 3, 4, 5 (bottom right, bottom left, top 
right and top left, respectively). At face value, three clusters, with 
ellipsoidal distributions, are preferred by the data (with a best-fit 
BIC value of —695). For simplicity, the estimated BIC values can 
be scaled to the best-fit BIC, such that a scaled-BIC of unity cor- 
responds to the best model. Figure 15] illustrates the variation of 
such scaled BIC value as a function of number of clusters for the 
full/secure distance samples. In this representation, a measure of 
the preferred number of clusters is given by the leading edge of 
the plateau in scaled BIC. While the two cluster model (essentially 
identical to the upper and lower tracks as defined in Section 1) is 
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Figure 4. Clustering derived from affinity propagation {apcluster) with ini- 
tial preferences p=— 100 and p=— 70, respectively, for the top and bottom 
panel. The damping factor A is set to 0.9 for both cases. The exemplars (i.e. 
cluster centers as identified by the code) are marked with squares. 



associated with a slightly higher (1.18 vs. 1) scaled-BIC compared 
to the three clusters, it is clear that two clusters provide a better de- 
scription of the data set over a single cluster model (BIC value 1.18 
vs. 1.3, respectively). In addition, the three clusters consist of the 
upper and lower tracks as well as a separate grouping containing 
XTE Jl 118+480 points, which is likely distinguished by the par- 
ticularly dense sampling over a very narrow dynamic range for this 
object, rather than any physical characteristics (for fitting purposes, 
the same XTE Jl 1 18+480 data points discussed here were actually 
averaged to a single data point in GFP03). 

While four and five clusters (splitting the upper and then lower 
track) do not provide an improvement over three, when restricting 
consideration to the subset of objects with secure distance measure- 
ments, the preferred three cluster model again isolates most of the 
XTE Jl 1 18+480 points but now splits off the high-luminosity seg- 
ment of the upper track while merging the low-luminosity segment 
of the upper track with the (here sparsely populated) lower track. 

In summary, this method identifies three clusters - one cor- 
responding entirely to the low-dynamic range/high cadence XTE 
Jl 1 18+480 data - as preferred for the full dataset and likewise for 
the subset of objects with secure distances, although in the latter 
case the groupings are not obviously related to the upper and lower 
tracks. When XTE Jl 118+480 is averaged to a single data point, 
two clusters (the upper and lower tracks) are preferred for the full 
dataset, and perhaps also for the best distances subset (splitting 
off the high-luminosity segment of the upper track), although here 
three clusters (now separating out the lower track) does still have a 
slightly better BIC value. 

An independent method of assessing clustering significance 
(also without the necessity of fixing a priori the clustering to 
be tested) is provided by the heirarchical Bayesian model imple- 
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Figure 5. Dendrogram constructed through top-down hybrid hierarchical 
clustering, preserving mutual clusters (hybrydHclust). The inset shows the 
corresponding two and three cluster splits (which have been adopted for the 
linear regression analysis). 



mented by Fuentes & Casella (2009; bayesclust). Bayesclust allows 
to first test the null hypothesis (equivalent to just one cluster). It 
then procedes to test the alternative hypothesis to be either 2, 3 or 4 
clusters at any one time. The assumed distribution for the observa- 
tions is multivariate normal, and we use the default non-informative 
priors on a (the variance) and /j, (the mean for each cluster), with 
a minimum cluster size of 10% of objects (i.e., 16 data points) en- 
forced. Bayesclust adopts an importance sampling estimate for the 
Bayes factor by sampling from the partition distribution g(iS). The 
significance of clustering is then assessed through a frequentist ap- 
proach, i.e. by simulating the distribution of the Bayes factors under 
the null hypothesis of a single cluster. The corresponding p-value, 
which gives a measure of the likelihood for 1 cluster, is is 0.003, 
indicating again that two clusters are preferred over one. The sug- 
gested partitioning, however, is somewhat different from what iden- 
tified by mclust, in that a small fraction of the data points on the 
high luminosity portion of the bottom track as identified by mclust 
are identified as members of the top track by bayesclust. 



3.2 Optimal clustering of observations 

We further test and verify the model-based clustering presented 
above through applying three additional approaches: partitioning 
around medoids (Kaufman and Rousseeuw 1990; pom), affinity 
propagation through iterative maximization of net similarity (Frey 
& Dueck 2007; implemented by Bodenhofer et al. 201 1 as apclus- 
ter), and hybrid hierarchival clustering using mutual clusters (Chip- 
man & Tibshirani 2006; hybridHclust). 

Both the k-means and k-medoids algorithms are partitional, 
i.e. break the dataset up into groups, and both minimize the distance 
between points labeled to be in a cluster and a point designated as 
the center of that cluster, although partioning around medoids is 
considered a more robust version of the standard fc-means cluster- 
ing algorithm in that it is more robust to noise and outliers. Pam is 
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a classical partitioning technique that clusters the dataset of n ob- 
jects into k clusters, where k is a user-supplied parameter. When 
two clusters are requested, the code again separates out the upper 
and lower tracksj As before, the most luminous points are included 
with the upper track, although they appear potentially also consis- 
tent with the lower track. When three clusters are requested, the 
upper track splits, but the high-luminosity end of the lower track 
also joins the high-luminosity segment of the upper track. 

Apcluster (Frey & Dueck 2007) is a fairly new algorithm that 
takes as input measures of 'similarity' between pairs of data points 
and simultaneously considers all data points as potential cluster 
centers. In complex multi-dimensional problems, such as grouping 
images of faces, of genes identification, affinity propagation signif- 
icantly out-performs even the best result of multiple fc-means runs. 
Cluster centers are usually found by randomly choosing an initial 
subset of data points and then iteratively refining it, making the fi- 
nal result somewhat dependent on whether the initial choice is a 
fair representation of the whole data set. Instead, affinity propaga- 
tion identifies exemplars (i.e. cluster centers) via 'message passing' 
between points*] Messages are exchanged between data points until 
a high-quality set of exemplars and corresponding clusters emerge. 
Within the algorithm framework, adding a tiny amount of noise to 
the similarities to prevent degenerate situations (oscillations) cor- 
responds to increasing the damping factor A (allowed to vary be- 
tween and 1). High values of the preference p will cause affinity 
propagation to find many exemplars, while low values will lead 
to a small number of exemplars. Going back to our data set, low 
initial preference (p=— 100) identifies again the upper and lower 
tracks; with higher preference (p=— 70), the split occurs midway 
along and transverse to the upper track. Changing the damping fac- 
tor from A=0.9 to A=0.5 results in a slightly different membership 
for the 3 cluster model. 

Finally, we test a hybrid hierarchical clustering method that 
uses mutual clusters, where a mutual cluster is composed of points 
having a maximum relative distance less than the shortest distance 
to any point outside the group. HybridHclust relies on the fact 
that bottom-up clustering cannot break a mutual cluster. That is, 
when agglomerating points, these method will never add points 
outside the mutual cluster before first joining all points inside 
the mutual cluster. The resultant top-down clusterings are then 
stitched together to form a single top-down clustering diagram, 
known as 'dendrogram', which allows to identify the number 
and composition of the clusters. The dendrogram for our data set 
is shown in Figure B] with insets illustrating the two and three 
cluster branches. The two primary clusters (splitting at a height of 
300) are identified with the upper and lower tracks. The next two 
sub-clusters (splitting at a height of 140) dissociate the upper track 
into higher and lower luminosity portions. 



3 Selecting from £ x , £ T rather than from x, y forms instead two groups split 
near£ x = 36.3. 

4 Two kinds of message are exchanged: (i) the 'responsibility' r(i, k) (sent 
from data point i to candidate exemplar point k) reflects the accumulated 
evidence for how well-suited point k is to serve as the exemplar for point 
i, and (ii) the 'availability' a(i, k) (sent from k to i), which reflects the 
evidence for how appropriate it would be for point i to choose point k as 
its exemplar, taking into account the support from other points that point k 
should be an exemplar (from Frey & Dueck 2002). 



Synthesizing the results from the two previous Sections, we 
conclude that two physically meaningful clusters are indeed sig- 
nificantly present in the radio/X-ray domain of hard state BHBs, 
loosely corresponding to the upper and lower tracks as identified in 
Section 2, although the specific characteristics of the lower track in 
the full dataset are influenced by objects with uncertain distances. It 
is worth noticing that the six most luminous points in the plot (cor- 
responding to ^ r >31 and £ x >37 in FigurefTJ are identified as mem- 
bers of the upper track by all clustering algorithms, whereas they 
could easily be classified as the upper luminosity end of the lower 
track in a more arbitrary 'by-eye' classification. In order to assess 
the robustness of our clustering analysis on measurement errors - 
which are not included in any of the routines discussed above - we 
verified that randomly scrambling the data points within a Gaussian 
of a — 0.15 in £ x and £ r (i.e., a factor of 0.7-1. 4 in flux, which may 
account for intrinsic source variability, lack of strict simultaneity, 
as well as measurement errors) still returns two clusters as the pre- 
ferred model, and that the cluster membership is preserved. 

We adopt the results described above to populate the cluster 
members for the linear regression analysis described in the follow- 
ing Section. The two cluster model - as identified by all methods 
except for bayesclust - corresponds to the lower and upper tracks 
in, e.g., the top inset of Figure B] while the three cluster model 
typically subdivides the upper track, although it should be kept in 
mind that various methods suggest somewhat different three-cluster 
splits (apcluster and hybridHclust return exactly the same group- 
ings for the three cluster model, as shown in the bottom panel of 
Figure H] and the bottom inset of Figure pj respectively). For the 
two cluster model, the various algorithms (mclust, apcluster, pam 
and hybriHclust) assign 86 and 71 data points to the upper and 
lower clusters, respectively. Restricting the selection to the secure 
distance sub-samples returns 74 and 6 points. 



4 LINEAR REGRESSION WITHIN CLUSTERS 

We procede to carry out linear regression on (£ T — 29.5) = a + 
/3(lx — 36.6), with intrinsic random scatter included. The center- 
ing is based on the median luminosities in the full sample. Two 
Bayesian modeling packages are used to determine the parameters, 
the first from Kelly (2007; implemented in IDL as Unmix _err.pro), 
and the second from Hall (201 1; implemented in R as LaplacesDe- 
mo/fj, which serves for comparative purposes with analysis per- 
formed on the detections only. 

The Kelly (2007) code permits input of measurement errors 
and optional dependent variable censoring. We assume uncertain- 
ties on both £ x and t T of 0.15 dex. The radio upper limits are in- 
cluded in all fits, but fitting is also carried out using only the de- 
tected points for comparison and for \ 2 tests. Three Gaussians are 
used in the independent variable mixture modeling, and a mini- 
mum of 5,000 iterations are performed with Gibbs sampling. The 
most likely parameter values are estimated as the median of 10,000 
draws from the posterior distribution, with credible intervals corre- 
sponding to lcr errors calculated as the 16th and 84th percentiles. 
The best-fit models for each cluster are calculated separately. 
The Hall (2011) code provides a framework for using adaptive 



http://www.statisticat.com/laplacesdemon.html 
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Figure 6. Results from the linear regression analysis, performed with the 
Bayesian modeling package of Kelly (2007). The solid (dashed) lines are 
computed for all data points (detections only). Top: full dataset. Bottom: fit 
to two clusters. See FigurelTlfor details on the best-fit parameters. 



Markov Chain Monte Carlo to optimize parameters for a user- 
specified model and priors. It constructs a linear model for which 
the prior probabilities of a and /3 within each considered cluster are 
taken to be Gaussian with mean zero. After each set of iterations 
the program conducts self-diagnostics and if necessary suggests an 
additional run with modified sampling techniques to ensure con- 
vergence. The final runs consisted of 40,000-100,000 iterations, 
thinned to retain 1,000 points. The most likely parameter values 
are again estimated as the medians. For the purpose of this work, 
the Hall code is employed in order to compare the results from 
the Kelly code with detections only (this is because the Hall code 
would require modifications in order to deal with upper limits and 
simultaneously evaluate the goodness of fit). 

We consider three models: one cluster, two clusters, or three 
clusters - the latter as identified by hybridHclust and apcluster. 
The various models are compared through both % 2 versus degrees- 
of-freedom tests (implemented in IDL by Craig Markwardt as 
mpftest.pro), and through Laplace-Metropolis estimation of Bayes 
factors, for the Kally and Hall code, respectively. Both /-test 
(Kelly) and Bayes factor (Hall) calculations indicate that the two 
cluster model, with independent linear fits, is a significant improve- 
ment over fitting all points as a single cluster (>99.9% confidence). 
This is reflected in the highly reduced values of the intrinsic scat- 
ter, rro, which decreases from 0.40 for the one cluster model to 
0.22/0.11, respectively, for the higher and lower track fits for the 
two cluster model. Results of the linear regression are shown in 
Figures [6] and U\ The slope of the linear fit to the upper track in 
the two cluster model (from Kelly's code, including upper limits) 
is 0.63 ± 0.03, while the slope for the lower track is steeper, with 
0.98 ± 0.08. The Hall code gives similar slopes, with 0.60±0.04 
and 0.88±0.13, respectively for the upper and lower track for de- 



Figure 7. Results from the linear regression analysis conducted in Section 
4 (using the code by Kelly 2007). The ellipses enclose 68 per cent (solid) 
and 90 per cent (dashed) joint confidence regions for the intercept (a) and 
the slope (J3). The crosses are the best-fit values. Quoted errors on the fitted 
intercept, slope, and intrinsic scatter (cto) are at the \u level for one param- 
eter of interest. Label colors correspond to the fitted models in Figure 6, 
with black for the one cluster model and red/blue for the upper/lower track 
in the two cluster model. The intrinsic scatter is lower for the two cluster 
model, with <to = 0.23/0.10 versus eg = 0.40 for the one cluster model. 



tected points. The Kelly code returns consistent values with detec- 
tions only: the fitted slopes are 0.61 ±0.02 and 0.99±0.08 (showing 
that the upper limits are almost completely uninformative for the 
lower track). This indicates that the few upper limits do not affect 
the results, providing support for the robustness of the clustering 
analysis from which they were necessarily excluded. However, we 
wish to stress that the uncertainties on the linear regression param- 
eters are likely underestimated in that they are conditional on the 
cluster identifications. 

As an additional sanity check, we randomly selected two sub- 
samples, with sizes corresponding to those of the two genuine clus- 
ters, and verified that in this case the dual line fit was properly re- 
jected by both methods as a potential significantly improved model. 
The /-test calculations further suggest that the three cluster model, 
with independent linear fits, is at most a marginal improvement 
(p = 0.04) over the two cluster model and fits, although the Bayes 
factor suggests a more significant improvement. Regardless, the po- 
tential improvement for the three cluster model is achieved through 
a flattening of the low-luminosity segment of the upper track, which 
may be influenced by the location of the many XTE Jl 118+480 
points slightly below the upper track. Qualitatively similar results 
hold for the subset of objects with secure distance measurements, 
and in particular a separate line through the six lower track points 
provides a significant improvement to the overall fit. 
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5 SUMMARY AND DISCUSSION 

Since the claim of a universal correlation between the radio and X- 
ray luminosity for hard state BHBs (GFP03), a more complex pic- 
tures has emerged. With the addition of newly discovered sources, 
as well as new data from well studied ones, the scatter about the 
2003 best-fit relation has been steadily increasing. In fact, the anal- 
ysis presented here shows that, from a statistical point of view, the 
existence of a single universal correlation between the radio andX- 
ray luminosity of hard and quiescent state BHBs is to be dismissed. 
Our analysis, based on a number of clustering algorithms applied to 
nearly-simultaneous radio/X-ray observations of 18 BHBs, shows 
that a two cluster model, with independent linear fits, provides a 
better description of the data. The existence of two tracks does not 
depend on whether sources with highly uncertain distances are ex- 
cluded from the analysis, although the actual functional form of the 
fits does, particularly for the lower track. The slope of the linear fit 
to the upper track (0.63 ± 0.03) is still consistent, within the errors, 
with the best-fit slope (0.7 ± 0.1) given by GFP03 (later revised to 
a shallower ~ 0.6 by Gallo et al. 2006 and Corbel et al. 2008), as 
well as with the fitted slope for the black hole Fundamental Plane 
(FP) relation (0.60 ± 0.11, although a small fraction of the BHB 
data considered here is generally included in the whole FP sample). 
The lower track slope (0.98 ± 0.08) is not consistent with the upper 
track, nor it is with the widely adopted value of 1.4 for the neutron 
stars (Migliari & Fender 2006). 

These results are qualitatively insensitive to selecting sub- 
samples, to the treatment of upper limits, and, we argue below, 
to accuracy in the source distances. Uncertain distances naturally 
tend to be equal to or larger than Galactic Center distance, where 
absorption is more severe (this is true for 6 out of the 7 sources 
in the uncertain distance sample, i.e. with the exception of XTE 
J1650— 50Qjl. Notwithstanding this effect, the lower track is very 
unlikely to be purely distance-driven. If distances were systemati- 
cally over-estimated for the uncertain distance sample, this would 
make them even less luminous, effectively widening the divide with 
the upper track. On the other hand, as a result of the linearity in 
the lower track slope, inflating the uncertain distances of the lower 
track sample to larger yet values would shift the lower track to- 
wards higher luminosities along the Lx oc L r line. The average 
distance of the uncertain distance sample would have to be in- 
creased to ^ 20 kpc in order to have the lower track data points 
merge with the most luminous data points within the higher track, 
which seems unlikely, at least for all the lower track sources. 

In our analysis, we purposely chose not to group data points 
belonging to the same source, such that the clustering analysis is 
somewhat blind to specific sources. An orthogonal approach is that 
followed by Soleri & Fender (201 1), who, however, could not iden- 
tify any obvious dependence of the BHB jet power (as traced by 
their radio luminosity) on the binary system parameters (i.e. orbital 
period, disc size, inclination) and/or outburst properties. Although 
Soleri & Fender are able to reproduce the large scatter observed in 
the radio/X-ray luminosity plane by inflating the jet bulk Lorentz 
factors to values larger unity for X-ray luminosities above 10 per 
cent of the Eddington limit and assuming random inclinations, this 



6 The generally adopted distance of 2.6 kpc is simply based on the assump- 
tion that it reached a certain fraction of the Eddington luminosity during a 
soft-to-hard state transition (Maccarone 2003). 



scenario is admittedly contrived and even at odds with some of the 
measured inclination angles. 

Rather than a single correlation with large scatter, the anal- 
ysis presented here provides strong evidence for the existence of 
two tracks in the radio/X-ray luminosity plane. The actual clus- 
ter composition differs somewhat from that suggested by Coriat et 
al. (201 1) based on visual inspection of the radio/X-ray luminosity 
plane for a sub-sample of BHBs plus the neutron stars, particularly 
when it comes to the top right corner of the plane (specifically, Co- 
riat et al. single out all of the HI 743— 322 high-luminosity points 
as belonging to their lower track - see figure 5 in their paper; see 
also Kalemci et al. 2006 and Jonker et al. 2010). Most importantly, 
the appealing suggestion that the lower track may be related to ra- 
diatively efficient accretion in the hard state strongly relies on a 
slope of ~ 1.4 for the lower tracHj which is not consistent with 
the value derived in this work based on a larger sample and more 
rigorous analysis. 

Although there is no reason to believe that the two luminosity 
tracks may be a selection effect due to lack of observational cover- 
age in the gap, it should be noted the lower track occupies a region 
of the parameter space at relatively high X-ray luminosities, a fact 
which led Coriat et al. (201 1) to suggest that the two tracks may still 
collapse back to one at Eddington-scaled X-ray luminosities below 
~ 10 — . While this seems to be the case based on the behavior 
of H1743— 322, which appears to have made a transition from the 
lower to the higher track as its X-ray luminosity decreased, there 
is a small but not negligible chance that the low X-ray luminosity, 
upper track data points for this source may actually correspond to 
unresolved optically thin radio ejections (see discussion in Jonker 
et al. 2010), rather than to partially self-absorbed core radio emis- 
sion from a steady jet (if so, they ought to be discarded). Unfortu- 
nately, as discussed by Miller- Jones et al. (2011), the current up- 
per limits on the radio luminosities of most BHBs (see Figure 1) 
make it impossible to establish whether the divide may also extend 
toward very low Eddington ratios, although the low upper limits 
on the radio luminosity of GRO J1655 — 40 and (particularly) XTE 
J1550— 564 in quiescence reported by Calvelo et al. (2010) indicate 
that that might be the case. 

On the other hand, there are at least two sources that seem 
to be able to make transitions between the two tracks: beside 
H1743-322 (with the aforementioned caveats), GRO J1655-40, 
too, has two simultaneous radio-X-ray detections each falling on 
either track (and so does the newly discovered XTE J1752— 522; 
Ratti et. al., submitted). While better sampling is clearly needed in 
order to make a stronger statement than based on a handful of data 
points, we conclude that there is tentative evidence for sources be- 
ing able to make transitions from the lower to the upper track as 
they fade towards quiescence, although with no obvious parameter 



7 The suggestion that the lower track might be related to radiatively effi- 
cient accretion (Coriat et al. 2011) comes from the combination of: (i) the 
theoretical scaling between the radio and the total jet power: P r oc Lh£ 
for partially self-absorbed jets (Blandford & Konigl 1979; Falcke & Bier- 
mann 1996; Heinz & Sunyaev 2003); (ii) the empirical relation L r oc L x ' 4 . 
Under the assumption that the total jet power is directly proportional to the 
mass accretion rate, fa, the two scaling relations combine to give Lx oc fh 
for the lower track, characteristic of radiatively efficient accretion. Simi- 
larly, L r oc L x 7 leads to Lx oc rn 2 for the upper track, which would be 



indicative of radiatively inefficient accretion. 
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responsible for driving such a transition. While the option remains 
open that some of the lower track sources may actually be neu- 
tron stars, XTE J1550-564, XTE J1650-500 and GRO J1655-40 
(only one out of two data points) all sit on the lower track, and 
all have confirmed black hole primaries (Orosz et al. 2011; Orosz 
et al. 2004; Greene et al. 2001), ruling out the possibility that all 
the sources on the lower track may be neutron stars. If confirmed, 
sources 'jumping' between the two tracks would further validate 
this statement. 

We have no compelling explanation for this behavior, not for 
the very existence of the two luminosity tracks. Hysteresis effects, 
of the same kind as reported for the near-IR-X-ray correlation by 
Russell et al. (2007), are not responsible for the observed pat- 
tern. As an example, GX339— 4 is known to fall nicely on the up- 
per track over repeated outbursts cycles, both during the rise and 
the decline (e.g. Corbel et al. 2003). Casella & Pe'er (2009) as- 
cribe the large scatter about the correlation to differences in the 
strength of the jet magnetic field from source to source. However, 
there is no straightforward explanation for a dual track behavior 
in the context of this model (Pe'er & Casella 2009), nor for the 
jumping behavior. Similarly, the existence of two tracks with dif- 
ferent slopes is not expected within the jet-accretion models of 
Markoff et al. (2001,2003) and/or Yuan et al. (2005). The latter 
predicts that the slope of radio:X-ray correlation should become 
steeper (~1.2), though at Eddington-scaled X-ray luminosities be- 
low £ 10~ 5 — 10 , while we observe dual tracks at Eddington- 
scaled X-ray luminosities above ~10~ 4 (in this respect, deeper ob- 
servations of quiescent BHBs are surely desirable in order to estab- 
lish whether the lower track extends down to very low luminosities, 
as results by Calvelo et al. seem to indicate). 

Finally, as also shown by Fender et al. (2010), the existence 
of two tracks within the radio/X-ray luminosity domain of BHBs 
cannot be possibly driven by two BHB sub-samples with low and 
high spin. This conclusion remains valid even restricting the anal- 
ysis to spin measurements from reflection-fitting methods alone 
(e.g. 4U 1543—47 is the most radio loud source in our sample - 
in terms of radio to X-ray luminosity ratio - and has an inferred 
spin of 0.3 ± 0.1; the opposite is true for GRO J1655— 40, on the 
lower track with an estimated spin value of 0.98 ± 0.01; Miller et 
al. 2009) or to thermal continuum-fitting alone (e.g. A0620— 00 lies 
on the low luminosity end of the upper track, with a spin value of 
0.12 ± 0.2 while GRO J1655-40 is on the lower track with a spin 
value between 0.65 and 0.75; Gou et al. 2010, Shafee et al. 2006, 
respectively). Furthermore, even in the absence of any spin mea- 
surement, sources jumping between tracks rules out the possibility 
the black hole spin has a strong effect on the radio luminosity of 
the compact jet and hence on defining the two tracks, since the 
spin value of any given source can not possibly vary on the typ- 
ical time scales of BHB outbursts (here we strictly refer to radio 
emission from the partially self-absorbed core radio jet observed 
in the hard state, a.k.a. compact jet - while it should be noted that 
a different conclusion has been drawn by Narayan & McClintock 
2012 based on the luminosity peak of radio flares associated with 
hard- to-thermal state transitions). 

Our results confirm and strengthen the conclusions of Soleri 
& Fender (2011) that the existence of two separate tracks in the 
radio X-ray radio domain of BHBs does not reflect any obvious 
trend in any of the (known) physical parameters of the systems. The 
possibility remains open that the two tracks may reflect substan- 



tial differences in the actual average X-ray energy spectra. Pending 
availability of pointed X-ray observations with simultaneous radio 
coverage, a detailed X-ray spectral study could potentially high- 
light systematic differences between the two tracks, e.g. in terms of 
curvature in the hard X-ray spectrum and/or presence/absence of 
a optically thick disk (see Miller et al. 2006; Tomsick et al. 2009; 
Cabanac et al. 2009; Dunn et al. 201 1). 

While a theoretical interpretation of this results is beyond the 
scope of this work, the existence of two tracks in the radio/X-ray 
luminosity domain of BHBs significantly hampers the predictive 
power of any X-ray binary radio/X-ray flux measurement for the 
purpose of mass and/or distance determination. Also, it should be 
kept in mind that these two tracks are not to be considered as 
analogs to the radio-quiet and radio-loud tracks (if any) for AGN: 
as demonstrated by Kording et al. (2006b) the entirety of the ob- 
servations presented here - i.e. those of hard state BHBs - closely 
resemble AGN of the 'radio-quiet' variety, i.e. have moderate Ed- 
dington scaled X-ray luminosities and fairly stable jets, with par- 
tially self-absorbed spectra reminiscent of compact extra-galactic 
radio cores. By contrast, radio-loud AGN would be comparable to 
BHBs in the very high state, showing powerful and transient jet 
ejections with optically thin spectra, while BHs accreting between 
a few and 10 per cent of the Eddington X-ray luminosity are known 
to display a very different pattern radio:X-ray behavior (e.g. Chat- 
terjee et al. 2009, King et al. 201 1, and references therein). 

Further extrapolating these results to AGN and the FP of black 
hole activity, we argue that, while the measured scatter to the plane 
potentially allows for multiple tracks having comparable separa- 
tions as the BHB's, dealing with a third parameter (and its large 
uncertainties) introduces a further level of complexity in the prob- 
lem. Good prospects for breaking the mass degeneracy come from 
selecting a sub-sample of super-massive black holes with dynam- 
ically measured masses (Giiltekin et al. 2009), since dramatically 
reducing the errors on the mass estimate results in a much tighter 
and better calibrated relation. An additional complication is intro- 
duced by the possibility that AGN obey the FP relation only in 
a time-averaged fashion (Miller et al. 2010), and that monitoring 
super-massive black holes on time scales of months/years may in- 
troduce further deviations from the quoted best-fit line, albeit not 
necessarily of the same nature as the dual tracks discussed in this 
work for BHBs. 
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